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SOME FACTORS RELATING TO SOCIAL 
ACCEPTANCE IN EIGHTH-GRADE CLASSROOMS 
EDWARD A. TAYLOR 


Supervisor of Testing and Evaluation 
Alameda County Schools 
Oakland, California 


If we are to educate children to face a work-a-day world we 
must realize the importance of social acceptability in recreational 
situations, in work situations, and in the enforced intimacy of 
marital living. When we acknowledge the responsibility of 
educating for social acceptance, we should be prepared to 
objectify and evaluate our progress in attaining this goal. 

Present-day chronological age promotion policies in public 
schools are based on the assumption that children are more 
socially acceptable to those of their own chronological age than 
of those of their own mental age. The present investigation 
includes a critical evaluation of this assumption. 

This study is intended to be both theoretical and practical. 
As a theoretical study it is submitted as a needed contribution 
to the literature. Asa practical study an attempt has been made 
to investigate certain relationships of possible interest to intelli- 
gence-test constructors and all school personnel using intelligence 
results in individual or group guidance. 


THE PROBLEM 


The investigation undertaken was a consideration of interrela- 
tionships between sex, age, intelligence and social acceptance of 
children in eighth-grade classrooms with varying social climates. 

The broad problem of investigating factors relating to class- 
room social acceptance of eighth-grade pupils was delimited to a 
consideration of relationships between sex, age, various aspects 
of intelligence and social acceptance according to peer judgment 
of eighth-graders. 
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With sex held as a controlled variable, relationships were 
investigated between the following measures of intelligence and 
social acceptance: 

A. Intelligence 
1. Mental age 
2. Mental bias 
a) non-verbal mental age minus verbal mental age 
b) absolute value of the difference between non-verbal 
mental age and verbal mental age 
B. Social Acceptance 
1. Social acceptance by same sex 
2. Social acceptance by opposite sex 

In addition to investigating relationships between these 
measures for unselected groups, parallel analyses were made for 
selected groups in traditional and Progressive classrooms. * 


HYPOTHESES 


The following hypotheses were tested in the investigation: 

1) Pupils whose ability profile is biased non-verbally are more 
socially acceptable to their classmates than are pupils whose 
ability profile is biased verbally. 

2) There is no significant relationship between a pupil’s social 
acceptance and the flatness of his mental ability profile. 

3a) There is a significant positive relationship between mental 
age and social acceptance; and 

3b) This relationship is significantly more positive than that 
between chronological age and social acceptance. 

In an attempt to bold sex and group climate as controlled 
variables each hypothesis was tested for both sexes for: 

1. Unselected eighth-grade classrooms | 

2. Progressive eighth-grade classrooms 

3. Traditional eighth-grade classrooms 





*In order to identify traditional and Progressive classrooms, district 
administrators and county school supervisors were asked to rate partici- 
pating classrooms according to the following definitions: 1) The traditional 
classroom was defined as one in which discipline is autocratically handed 
down by the teacher, educational objectives are subject-matter-centered, 
and the importance of social adjustment of children is minimized. 2) A 
Progressive classroom was defined as one in which the techniques of class- 
room management and control involve stimulation of group standards of 
behavior and individual willingness to codperate, educational objectives are 
primarily social in nature with subject matter being a means to this end, 
and the importance of social adjustment of children is maximized. 
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REVIEW OF RELATED INFORMATION 


The earliest reported study on the influence of intelligence on 
social acceptance is that by Almack' in 1922. Constructing a 
simple sociometric device to determine the children’s choices of 
companions, he correlated mutual friends’ CA’s, IQ’s, and MA’s. 
He concluded that ‘‘there is a tendency for an individual in 
choosing his associates to select from those of his own mental 
level.’”’ Others investigating relationship between intelligence 
and social acceptance are Partridge,'* Warner,?® Williams,”® 
Bonney? and Pintner and others.*4_ Their conclusions in general 
indicate a positive relationship between intelligence and social 
acceptance. 

Not all studies of social acceptance and intelligence agree with 
those above. Moreno,'* p. 91, Loeb,'* Challman,® Furfey® and 
Hagman” report negative results in relating intelligence to social 
acceptance. 

Other investigators have reported relationships between intel- 
lectual profile and social acceptance. Perhaps the best known 
is Wechsler?’ who classified the verbal and performance intelli- 
gence test profile characteristics of various clinical groups. 

Earl,® working with institutionalized adult male morons, also 
used verbal and performance tests. When scores of subjects 
were expressed graphically, certain types of profile were found 
to be significant in the prognosis of social adjustment. Flat 
profiles indicated the best social prognosis; peaks in performance 
were next best, peaks in verbal tests had bad prognostic import. 
Other studies, also on the clinical level, are listed in Mayman’s 
sixty-one item bibliography of attempts to discover relationship 
between personality and unevenness of attainment on intelligence 
test items.!® 

These studies are clinically-oriented and generally based on 
selected cases. Jones,'* working with an unselected group, 
investigated the concomitance of neurotic tendency and the 
verbal factor in intelligence. She noted that those of high 
verbal intelligence tended to be better adjusted than those of 
low verbal ability. She also noted a tendency for the flat-profile 
group to be better adjusted than the extreme verbal group. 
Jones used the California Test of Personality as a criterion of 
adjustment. The reviews of this test in Buros*® have been 
distinctly unfavorable, casting grave doubts on the validity of 
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self-report inventories in classroom situations. Symonds points 
out that: ‘‘By asking pupils to answer questions about them- 
selves one is securing evidence of only one kind of adjustment, 
namely the pupil’s own attitudes. Adjustment may also mean 
the reputation that a person has with others . . . ‘ p. 61. 

Others with Symonds’ viewpoint have investigated the rela- 
tionship between scores on self-report types of personality tests 
and sociometric ratings by associates. Challman,* Hagman,!® 
and Koch!’ have reported studies of pre-school children; Pintner?! 
and Wellman”* have worked with elementary-school children; 
Van Dyne* has investigated secondary-school youth; Vreeland*® 
and Richardson*?* have published investigations on the college 
and adult level, while Bonney* has reported studies on the ele- 
mentary, secondary, and college levels. In all cases the findings 
are essentially negative, correlations between self-report and 
sociometric scores being zero or very low. 

In view of the foregoing studies and conclusions, the criterion 
of social acceptance adopted in the present study is a social 
acceptance score based on classmates’ ratings for each child. 


DESCRIPTION OF SAMPLE AND DATA COLLECTED 


The sample selected was the eighth-grade population of the 
suburban and rural non-multigraded* schools in Alameda County, 
California, during the fall of 1949. Eighth-graders were chosen 
because they are the most mature pupils in day-long classroom 
contact and as a whole have been together longer than the pupils 
of any lower grade. 

Twenty-seven per cent of the sample are from small schools 
located in Mexican and Portuguese rural areas. Many of these 
children are bilingual, most of them from underpriviledged 
homes. The balance of the sample are from rapidly growing 
suburban districts where bilingualism is not much of a problem. 

The following data were obtained for each of the 1,177 children 
in the thirty-eight classrooms investigated: (1) Name. (2) Class- 
room identification. (3) Sex. (4) Age. (5) Mental age. 
(6) Non-verbal mental age minus verbal mental age (index of 
verbal or non-verbal mental bias). (7) Absolute value of the 
difference between non-verbal mental age and verbal mental age 





* A non-multigraded class is one in which a single grade level is held, as 
opposed to a multigraded class such as found in one-teacher schools. 
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(index of flatness of the mental profile). (8) Social acceptance 
by the same sex. (9) Social acceptance by the opposite sex. 


TREATMENT OF DATA 


Age, intelligence (including profile indices) and social accept- 
ance data for each child were converted to standard scores based 
on individual classrooms. Since it was anticipated that pupils 
of some classes would be brighter than others, would be 
older, would be more verbal and so forth, the standard score 
technique was used to equate all groups as to central tendency 
and variability. The social acceptance scale yielded scores 
which were a function of the number of favorable ratings for 
each child, which in turn were a function of class size and sex 
proportion. The standard score technique also equated class 
size and sex proportion as variables. 

By yielding comparable measures for all variables studied, the 
standard score technique enabled the investigator to consider all 
children as being within one classroom. Relationships were 
investigated by correlating variables in standard score forms. 


INSTRUMENTS USED 


The California Short-Form Test of Mental Maturity, Inter- 
mediate, ’47 S-Form was used to gather intelligence data. 

Table I compares reliabilities published for the California Test 
of Mental Maturity with reliabilities determined by the present 
investigator in a previous unpublished study. Considering that 
the sample investigated was less variable with respect to IQ 
than the norm group, the reliabilities seemed comparable. 

The Ohio Social Acceptance Scale furnished criterion scores of 
social acceptance. This instrument is a rating scale with which 
each pupil after indicating his sex, anonymously rates his class- 
mates on the following six-point scale: 


Rating Criterion 
1) “My very, very best friends” 
2) “My other friends” 
3) “Not friends, but Okay” 
4) “Don’t know them” 
5) “Don’t care for them” 
6) “Dislike them” 
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The ratings for each child were tabulated by sex of the raters. 
The weighted sums of the ratings for each child were the raw 
social acceptance scores for that child.* As previously explained, 
these raw scores were then converted to standard scores based 
on individual classrooms. Ohio Social Acceptance Scale test- 
retest data on three classrooms are presented in Table II. 


TaBLE I.—PUBLISHED AND OBTAINED SPLIT-HALF RELIABILITY 
CoEFFICIENTS (SPEARMAN-BROWN CORRECTED) FOR THE 
New CALIFORNIA SHORT-FORM TEsT OF MENTAL 
Maturity: INTERMEDIATE ’47 S-rormM* 

Published* Obtained? 


Non-language Factors .887 .878 
Language Factors .931 . 864 
Total Mental Factors .946 .881 


* The reliabilities in Table I were obtained from split-half correlations 
corrected by the Spearman-Brown formula. The resulting coefficient is that 
which Cronbach’ (p. 65) calls the ‘coefficient of equivalence.’ It indicates 
how much scores fluctuate from form to form of the same test. It is not to 
be confused, as Cronbach points out, with the ‘coefficient of stability,’ the 
reliability coefficient which indicates how much scores fluctuate on the same 
questions from one time to another. In general, the coefficient of equiva- 
lence tends to be higher than the coefficient of stability. 

*N = 700, grades 7-10 inclusive IQ = 16.0 Meanzg = 100.0 

>N = 297, all in seventh grade IQ = 12.6 Mean;g = 101.9 


It will be noted that social acceptance is least stable in the 
Progressive classroom. This might reflect the mental hygiene 
follow-up between tests in the Progressive group where the most 
rejected child was referred for psychological case study and 
therapy. Other rejected children in the same group were under 
observation during the test-retest interval by a teacher who 
planned classroom experiences and activities to draw these 
rejected children back into the group. Upon retesting, previ- 
ously rejected children were found to have gained in status. 
It seems justifiable to assume that the remedial measures were 
effective in lowering the stability of the social acceptance ratings. 
In the other classrooms, where no remedial steps were taken, the 
social acceptance stability coefficient compares favorably with 
the coefficient of equivalence for intelligence. 





* The weights allowed for the ratings are: 
Rating Number 1 2 3 4 5 
Weight 15 10 5 2 1 
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TABLE II.—Test-ReEtTest RELIABILITY COEFFICIENTS FOR THE 
Onto SocriaAL ACCEPTANCE SCALE IN THREE EIGHTH GRADE 




















CLASSROOMS 
—n Lapsed | Class Size Turnover , 
Time Ist Test 
Dropped | Entered 
Traditional |4 months 31 3 5 .90 
Unclassified |3 months 32 2 1 .89 
Progressive {3 months 27 4 l . 66 











The Ohio Social Acceptance Scale is discussed by Raths”? 
(p. 142) who observes that the test has proved valid by differen- 
tiating between criterion groups selected by teachers with 
“better than average insight into the social adjustment”’ of their 
pupils. 

Pepinsky”® and Jennings!! (p. 533n) comment favorably on the 
validity and stability of sociometric ratings, Jennings reporting 
test-retest coefficients as high as .96 after a four-day interval. 
Other investigators and their reported reliabilities are: Newstetter 
and others,’’ an average of .95; Zeleny,*® .93 to .95; Bonney? 
from .67 to .84; and Northway,'® from .60 to .90. 


PROCEDURE 


The California Test of Mental Maturity and the Ohio Social 
Acceptance Scale were administered by classroom teachers under 
supervision of the writer. The administration of the California 
Test of Mental Maturity is done by teachers as a matter of local 
policy. The administration of the Ohio Social Acceptance Scale 
was left in the hands of the classroom teacher in the interest of 
gaining proper pupil rapport. All scoring, tallying, summarizing 
and interpretation was done in a central office under the super- 
vision of the writer. 


FINDINGS 


As is usually the case with subjects of Mexican or Portuguese 
descent, the groups as a whole had higher non-verbal than verbal 
ability in a standard test. Of the thirty-eight classrooms con- 
sidered, twenty-seven were biased non-verbally. 
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The means for chronological and mental-grade placement 
varied in somewhat similar manner to these grade placements 
for individuals within an average classroom. The ranges of 
these means were: chronological age grade, from a low of 7.54 
to a high of 8.90; mental age grade, from 6.46 to 9.63; verbal age 
grade, from 6.22 to 9.33; and non-verbal age grade, from 6.55 to 
10.15. 

The Pearson coefficient of variability indicated that compared 
with traditional classrooms, Progressive groups were more 
variable by fourteen per cent in chronological grade placement 
and more variable by twenty-two per cent in mental grade place- 
ment. This may reflect genuine differences in promotion policy 
or merely the fact that the traditional classrooms were in one 
building with increased opportunity for homogeneous grouping. 

An attempt was made to discover differences between Progres- 
sive and traditional classrooms insofar as patterns of social 
acceptance were concerned. No difference in leniency nor 
variability of ratings was found. 

Intercorrelations between variables are presented in the accom- 
panying correlation matrices. The symbols used are as follows: 


CA Chronological age 

MA Mental age 

(A) Index of mental bias 

|A! Index of flatness of mental profile 
SA:s Social acceptance by same sex 
SA:o Social acceptance by opposite sex 


Correlations above diagonals are based on boys. Correlations 
below diagonals are based on girls, subscripts ‘B’ and ‘G’ 
indicating the number of boys and girls, respectively. 

In Matrix A the variable |A|, flatness of mental profile, exhibits 
low correlation with other variables. In the single significant 
case between |A| and (A), mental bias for boys, the scattergram 
was V-shaped. Because of this statistical artifact and the low 
correlations with other variables, rows and columns |A| are 
suppressed in Matrices B and C. 


DISCUSSION AND INTERPRETATION 


Standard score data relating to chronological ages, mental 
ages, and indices of mental bias and flatness of mental profile 
were correlated in testing the hypotheses restated here along with 
relevant findings. 
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1) Pupils whose ability profile is biased non-verbally are more 
socially acceptable to their classmates than are pupils whose 
ability profile is biased verbally. 

Findings relating to this hypothesis are presented in Table III. 

Table III indicates no significant relationship between social 

acceptance and non-verbal bias. Although the null hypothesis 
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MATRIX C 
Correlation Matrix, Eighth-grade Pupils in Traditional 
Classrooms : ; 
[2s 2 2 
CA ~.22] .16/—.21] .o1] >" 
MA |—.34 —.41} .03)—.03 
(A) .16) .24 .16) .11 
SA:s |—.04) .01 30 .46 
SA:o} .03; .08) .16) .65 
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TaBLE III.—CoRRELATIONS BETWEEN SocrAL ACCEPTANCE (BY 
SAME Sex) AND Non-VERBAL MENTAL BIAS OF 
EIGHTH-GRADERS IN TRADITIONAL, UNSELECTED, 

AND PROGRESSIVE CLASSROOMS 











Boys Girls 
Trad. | Unsel. | Prog. | Trad. | Unsel. | Prog. 
N 63 593 62 66 | 584 71 
r .16 | —.02 |} —.11 .30 | .00 . 20 
Test ratio 1.3 .49 .86 | 2.4 .00 1.7 























cannot be rejected there is a possible tendency for the non-verbal 
pupils to be better accepted in traditional classes. 

2) There is no significant relationship between a pupil’s social 

acceptance and the flatness of his mental ability profile. 

Findings relating to this hypothesis are presented in Table IV. 

Table IV indicates no significant relationship between social 
acceptance and flatness of mental ability profile. Even if the 
sexes were combined the test ratio would be too low to indicate 
significance at the one per cent level. Again the null hypothesis 
cannot be rejected. * 





* Hypotheses 1 and 2 were also tested for each of the thirty-eight indi- 
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TaBLE IV.—CoRRELATIONS BETWEEN SOCIAL ACCEPTANCE (BY 


SAME SEX) AND FLATNESS OF MENTAL ABILITY 
PROFILES OF EIGHTH-GRADERS IN UNSELECTED 


CLASSROOMS 
Boys Girls 
N 593 584 
r — .07 — .06 
Test ratio 1.7 1.5 


3a) There is a significant positive relationship between mental 
age and social acceptance; and 

3b) This relationship is significantly more positive than that 
between chronological age and social acceptance. 


Findings relating to these hypotheses are presented in Tables V 
and VI. 


TABLE V.—CORRELATIONS BETWEEN SoOcIAL ACCEPTANCE (BY 
SAME SEX) AND MENTAL AGE OF EIGHTH-GRADERS IN 
TRADITIONAL, UNSELECTED AND PROGRESSIVE 











CLASSROOMS 
Boys Girls 
Group 
Trad. | Unsel. | Prog. | Trad. | Unsel. | Prog. 
N 63 593 62 66 584 71 
r .03 .18 13 Ol .18 17 
Test ratio .24 4.4 1.0 .08 4.4 1.42 























Table V indicates significant, positive correlation for unselected 
groups. For both boys and girls the correlation between mental 
age and social acceptance is lowest in the traditional classrooms. 
This suggests the possibility that bright children are more socially 
acceptable in classrooms other than traditional ones. 

The correlations in Tables V and VI justify rejecting the null 
hypotheses for unselected groups.* Since none of the correla- 





vidual classrooms. No significant correlations were found when the class- 
rooms were grouped into the categories of large vs. small schools, rural vs. 
suburban and Latin vs. non-Latin. 

*In interpreting correlation coefficients one must consider the range of 
variables under consideration. In elementary classrooms the range of men- 
tal ages is approximately two or three times greater than that of chron- 
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TABLE VI.—CORRELATIONS BETWEEN SocrAL ACCEPTANCE (BY 
SAME SEX) AND CHRONOLOGICAL AGE OF EIGHTH-GRADERS 
IN TRADITIONAL, UNSELECTED AND PROGRESSIVE 























CLASSROOMS 
Boys Girls 
Group 
Trad. | Unsel. | Prog. | Trad. | Unsel. | Prog. 
N 63 593 62 66 584 71 
r —.21 | —.18 .00 | —.04 | —.13 | —.02 
Test ratio WS 4.4 .00 .o2 | 3.1 17 











tions between social acceptance and mental age were negative, 
and none of the correlations between social acceptance and 
chronological age were positive, one seems justified in accepting 
hypotheses 3a and 3b as holding in each type of classroom. 

Table VI indicates a tendency for the over-age boys to be less 
rejected in Progressive classrooms. 

The matrices were examined to discover trends of relationships 
between the three types of classrooms. Certain tendencies 
appeared, as shown in Table VII. 

Although the correlations in Table VII suggest tendencies in 
passing from the traditional, to the unselected, to the Progressive 
group, they are offered here with no comment. On the basis of 
random chance alone, some matrix cells would follow this pattern. 
Caution in singling out chance fluctuations and inferring 
significance therefrom is recommended. 


SUMMARY AND CONCLUSIONS 


Interrelationships were investigated between sex, age, intelli- 
gence and social acceptance of children in public school class- 





ological ages. The standard score technique equated all variables to a 


standard deviation of ten. 
In Matrix A the correlation between mental age and social acceptance 


by the same sex is .18. Had mental ages, rather than standard scores, 
been correlated, this would have risen to .42 while the correlations between 
social acceptance and chronological age would have remained unchanged at 
—.18 and —.13 for boys and girls, respectively. This would further justify 


accepting hypotheses 3a and 3b. 
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TaBLE VII.—CoRRELATIONS BETWEEN CHRONOLOGICAL AGE, 
Various Aspects OF MENTAL AGE AND SOCIAL 
ACCEPTANCE FOR EIGHTH-GRADERS IN TRADITIONAL, 
UNSELECTED AND PROGRESSIVE CLASSROOMS 











Type of Classroom 
Variables 
Sex Comment 
Correlated Trad. | Unsel. | Prog. 
MA and CA F | —.34 | —.25 | —.14 | Relationship has 
CA and SA:s | M/| —.21 | —.18 .00 | tendency to be 
MA and (A) F | —.24;| —.10 .10 | least positive in 
MA and SA:o} F .08 .14 .17 | traditional class- 
rooms. 
SA:s and (A) | M .16 | —.02 | —.11 | Relationship has 
SA:s and SA:o|} F .65 .59 .46 | tendency to be 
most positive in 
traditional class- 
rooms. 




















Key: CA—Chronological age. MA—Mental age. (A)—Index of mental 
bias. SA:s—Social acceptance by same sex. SA:o—Social acceptance by 
opposite sex 


rooms with varying social climates. The California Test of 
Mental Maturity was used to obtain data on mental ages and 
ability profiles of 593 boys and 584 girls in thirty-eight eighth- 
grade classrooms in rural and suburban schools. The Ohio 
Social Acceptance Scale, a peer judgment rating instrument, 
was used to obtain measures of social acceptance for these 
subjects. 

For each sex, relationships were investigated between social 
acceptance by each sex and: chronological age, mental age, and 
mental ability profile including mental bias and flatness of mental 
profile. In addition to investigating relationships between these 
measures for unselected groups, parallel analyses were made for 
selected groups in traditional and Progressive classrooms. 

The data indicate that: 

1) During an interval of three or four months social acceptance 
scores are as constant as IQ’s are usually reported to be in the 
literature. 
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2) The lower constancy of social acceptance scores in Progres- 
sive classrooms is consonant with an interpretation in terms of 
the effectiveness with which teachers in these classrooms deal 
with social problems. 

3) There is no significant relationship between a child’s social 
acceptance and the bias of his ability profile, although the data 
suggest that non-verbal children may be more acceptable in 
traditional than in other types of classrooms. 

4) There is no significant relationship between a child’s social 
acceptance and the flatness of his ability profile. 

5) In unselected classrooms a significant relationship was 
found between mental age and social selection. 

6) Younger children are more acceptable to their classmates 
than are older children. 


IMPLICATIONS 


Present-day chronological age promotion policies in public 
schools are based on the assumption that pupils are more socially 
acceptable to those of their own chronological age than to those 
of their own mental age. In present-day grade-groupings 
resulting from this policy, the assumption is not justified by the 
evidence. 

In grade-groupings resulting from a policy of ability promotion 
the relationships found here might not hold, although the present 
findings agree with those of Almack in 1922. 


SUGGESTIONS FOR FURTHER RESEARCH 


Classrooms were classified as Progressive or traditional 
according to administrators’ and supervisors’ ratings. More 
objective classification might have been obtained had the 
teachers taken an attitude test such as the Progressive Education 
Association’s Attitudes Toward School Life. 

Since the development of pupil skills in attaining social adjust- 
ment is accepted as an educational objective, it becomes a teacher 
responsibility to help socially-rejected children attain satis- 
factory status within the classroom. The Ohio Social Acceptance 
Scale objectifies status changes within groups. This acceptance 
scale or other sociometric device might be used to evaluate 
teacher effectiveness in helping socially-rejected children within 
the classroom. 

Almack as early as 1922 used a sociometric test to identify 
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mutual friends in the classroom. For these pairs he correlated 
mental and chronological ages. This suggests another approach 
in studying patterns of mental age and chronological age under 
varying classroom climates. 

The literature reports zero or very low correlation between 
sociometric ratings and scores on self-inventory personality tests. 
Sociometric ratings are suggested as criteria of validity for 
validating personality test items. 
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A PROJECTIVE TECHNIQUE FOR MEASURING 
POSITIVE AND NEGATIVE ATTITUDES 
TOWARDS PEOPLE IN A REAL-LIFE 
SITUATION 
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FARRELL L. HOLLINGSWORTH 
WILLIAM E. HALL 
University of Nebraska 


There has appeared in recent psychological literature an 
intensification of emphasis on investigating the individual’s 
‘frame of reference’ as a point of departure in the study of human 
personality. This emphasis in terms of an adequate understand- 
ing of human relations is long overdue. Although ‘frame of refer- 
ence’ has been recognized in the perceptual field and especially in 
the studies of vision, the consideration given it in determining why 
one behaves towards others as he does is of recent origin.” 

The present study is an attempt to measure an aspect of an 
individual’s ‘frame of reference’ in terms of positive and negative 
attitudes by scoring his recorded observations of people in a 
real-life situation. The subjects used were all enrolled in 
Teachers’ College, and the expectation is that the results will 
eventually be meaningful for use in teacher-training. It would 
seem to be of some real value to know whether prospective teach- 
ers do have a generalized attitude toward people and, if so, 
whether it can be measured. A good teacher deals in a wide 
variety of human relations. He must try to understand and 
work with each pupil individually. He must try to understand 
and work with the pupil when he is with other pupils, sometimes 
when he is with his parents, and sometimes when he is out in the 
community. The teacher must work with parents and fellow 
teachers in the same variety of relationships which have been 
mentioned in regard to the pupil. The success in teaching 
depends on how well the teacher can cope with the above rela- 
tions. If a generalized attitude does exist and if it can be deter- 
mined how this attitude affects human relations, a step forward 
in the educational process will be the result. 

This study is one of a series whose over-all purpose is to develop 


a technique which may be used to measure attitudes by studying 
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an individual’s projected feelings in a real-life situation. Three 
steps have been planned for the development of such a technique: 

1) A situation where a large group can be adequately moti- 
vated to record the behavior of people as they perceive it in a 
real-life situation. 

2) A system of interpreting and evaluating such projections 
that are made by the recording group. If this system is to be 
useful, it must be set up so that it will be meaningful to anyone 
desiring to work with it. 

3) A careful study of the group who have recorded their 
projections. 

This study is a beginning toward accomplishing the first two 
steps and a second study is under way attempting to accomplish 
the third. A systematic attempt was made to get the subjects to 
codperate in recording their feelings toward other people. To 
further achieve the conditions implied in the second step an 
objective method of evaluation has been worked out which 
increases the reliability in the scoring. 

Test Technique Selected.—A projective method was selected 
as the test technique to be used in this study. One of the advan- 
tages of using a projective method is that the subject (the person 
working on the assignment) has an opportunity to evaluate peo- 
ple without being aware that he will be judged on what he writes. 
Thus, the individual may express to various stimuli any feelings 
that he wishes. The authors feel that the closer the stimuli to 
which the subject expresses his feelings are to real-life situations 
the more accurate may be the inferences from the results. In this 
study people in a department store have been used as stimuli. 
This method will subsequently be referred to as the ‘Life-Situa- 
tion Technique.’ 

The ‘Life-Situation Technique’ has an advantage in the amount 
of time needed for administration. The use of a test in a teacher- 
training program must depend somewhat on the amount of time it 
requires. The ‘Life-Situation Technique’ can be given as an 
assignment to a rather large group of students to be carried out at 
a specified time and then scored at one’s convenience. The 
assigning, observing, and writing-up of what has been seen has 
the advantage of a group test because individual supervision is not 
required as in individual tests. 

Background of Research and Literature.—Much has been written 
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with regard to projective techniques, but the technique of using 
life-situations, described in this study, is unique to the authors’ 
knowledge. However, there are certain aspects of this study 
that resemble techniques used in previous studies. Rotter and 
Willerman’ have suggested a method of scoring responses to an 
incomplete sentence test which is somewhat similar to the one 
used in this study. They used three categories: 1) conflict, 
2) positive and 3) neutral, to classify the written statements. 
The authors concluded that the method has potentiality as a 
means for studying attitudes in which freedom of response and 
reasonably objective scoring can be obtained. 

The present research grew out of a study carried on by Markus- 
sen! and Warren.‘ They used real-life situations as stimuli for a 
projective technique which was designed to consider the methods 
people use in human relationships and to discover an instrument 
for measuring them. In their analysis they noted individual 
differences among the subjects in their attitude toward the people 
observed. They noticed the tendencies of some subjects to 
record predominantly the strong points of the people observed 
whereas others found much fault with all people they observed. 
It is important to point out here that the factors of attitude, which 
are the basis for the present study, were observed tendencies 
noticed in a research rather than hypothetical tendencies assumed 
to exist in the experimental data. 

The Situation Selected.—The data for the present research were 
collected in a real-life situation. The area chosen for the subjects 
to observe people was Gold’s Department Store of Lincoln, 
Nebraska. This is the largest and the most popular shopping 
center in the city, which has a population of approximately 100,- 
000. The store, in the downtown business district, is a block 
long and one-fourth block wide and has six floors. People of all 
economic levels shop here, particularly those of the middle class. 
Arrangement for the observations was made with the manage- 
ment, and the subjects were free to take notes as they observed. 

The Subjects Selected. The subjects used in this study were 
from the freshman class in the Teachers College at the University 
of Nebraska and were registered for the required orientation 


course, Education 30, in the fall term of the academic year © 


1948-1949. The course, Education 30, was concerned with 
several factors: 1) What is teaching? 2) Who makes a good 
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teacher? and 3) Will I (referring to the student) make a good 
teacher? During the course of study emphasis was placed upon 
the importance of observing people in order to understand them. 
Thus, the assignment to observe and record people’s behavior 
became the raw data for the ‘Life-Situation Technique.’ There 
were two hundred ninety-nine students enrolled in this course. 
One class did not participate and the observations of another 
class could not be found. Six students who were on the original 
roster either dropped the course or did not turn in a paper. The 
number of students used in this study was two hundred twenty. 
Of this group one hundred thirty-seven or sixty-two per cent were 
girls and eighty-three or thirty-eight per cent were boys. These 
percentages agreed closely with the total percentages of boysand 
girls in the entire enrollment of Teachers College. 


DIRECTIONS TO THE SUBJECTS 


The instructors of Education 30 were given the following direc- 
tions for making the assignment: (All of the instructors coéper- 
ated in making this assignment.) 


InstRucTOR: Read these directions to your class. You 
should read them twice. When questions arise, just read the 
part that applies one more time. Answer no other questions. 
Give near the end of the period. When the question arises as to 
the length of the paper you should suggest that you do not see 
how an hour of alert observation could be written up in any less 
than two pages. You may tell them, “In the past many stu- 
dents have written four, five, six pages, or more. You will find 
no lack of material after you have been there for an hour.” 


Directions To Be Read 


During this course we will be emphasizing that teachers must 
be students of human behavior. Everyone should spend some 
time observing and analyzing human behavior, but it is doubly 
important for anyone considering teaching to make use of many 
opportunities for practice in sharpening his ability to describe 
and judge other people. Therefore, we are giving you this 
opportunity to start developing one of the most essential skills 
of teaching. During the next four days everyone in Education 30 
is required to spend an equivalent of one classroom hour at 
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Gold’s Department Store observing the people. You may 
observe at any time when the store is open. You should take 
notes while you observe and then turn in a report describing what 
you saw and your impressions of what you saw. You will 
probably get the most from this assignment if you note and 
analyze the behavior of as many different people as possible. 
Please put on your paper the date and at what time you were 
there. 

This is a standard assignment for Education 30 students so you 
need not be concerned as to the reaction of the clerks and the 
management of the store. They will be expecting you. 


It can be seen by directions that students were free to go to the 
department store when they wished and that there were no pre- 
determined groups of observers. ‘The papers indicated that the 
students went at all business hours with the modal time recorded 
at 4:00 p.m. Some of the students went in pairs, but their 
descriptions were very different; in fact, it was difficult to know 
that they had gone in pairs except that they reported it. The 
subjects observed and discussed any people that were seen, such as 
clerks, customers, and floor-walkers. 

Motivation of the Subjects—The assignment was made at a 
time when it best fitted in with the objective of the course. The 
length of the papers returned ranged from sixty-five to one thou- 
sand thirty words with an average of three hundred twenty words 
per paper. In any study of this kind motivating the students 
would seem to be important. Even though this assignment was 
required of everyone it was found that of a sample of fifty papers 
of persons whose last names began with letters ‘Ne’ through ‘Th,’ 
nineteen subjects (thirty-seven per cent) in their first or last 
sentence referred to the trip as ‘‘one of the most enjoyable school 
experiences,” “‘interesting,” or “fun.”’ 

The Scoring of the Papers.—After the assignment had been 
completed the authors read the papers to determine how they 
might be scored so as to evaluate the positive and negative atti- 
tudes which might have been reflected. The analysis of the 
papers seemed to indicate that the positive and negative responses 
could be identified. To discover whether the positive and nega- 
tive responses could be generally agreed upon, sixty responses 
assumed positive and sixty-two assumed negative were extracted 
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and typed on 3 X 5cards. Nine persons not connected with the 
study sorted the responses with an average agreement of ninety- 
seven per cent with the authors’ original assumptions. Thus it 
appeared that there was general agreement on whether a ‘charged’* 
response was positive or negative. On the basis of these sorted 
responses an attempt was made to define positive and negative 
responses. Positive responses were the statements by the sub- 
jects that indicated that he was in some way ‘for’ the person or 
persons observed, that he saw their strong points, that he noticed 
the acceptable aspects of a person and that he attributed a desira- 
ble quality, thought or motive to the person. 

Negative responses were statements by the subject in which he 
gave some evidence that he was ‘against’ people such as men- 
tioning aspects of the person or situation that were not generally 
considered desirable or that detracted from the observed person’s 
prestige—finding fault or conflict among the observed, or attrib- 
uting a motive that was considered undesirable. Although these 
categories are not sharply defined it was found in classifying them 
that independent scorers had high agreement. 

After the statements were classified as positive or negative 
they were sorted into three different categories, Alpha, Beta, and 
Gamma. These different categories appeared to the authors to 
be degrees of positivity and negativity, but at the present there is 
no evidence to support this assumption. The Alpha type 
responses included objective descriptions of what was seen. 
There was some interpretation, of course, but responses were 
included in this category when they referred to some physical 
aspect of the observed person, rather than attributing qualities 
to the person asa whole. Examples of Alpha responses are: 


positive: 1) He was neat and well dressed. 2) She had 
beautiful gray hair. 

negative: 1) He wore a ragged coat. 2) I saw a man with his 
face all scarred up. 


The Beta type responses were considered to be more subjective. 
Here the subject usually attributed a quality to the person or 
described the condition of the person. Examples of the Beta 
type are: 





* Positive and negative responses when referred to collectively are called 
‘charged’ responses. 
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positive: 1) The clerk seemed to have a very pleasing personal- 
ity. 2) The mother was courteous to the clerks. 

negative: 1) He left with an embarrassed look on his face. 
2) The clerk, thoroughly disgusted, began bringing black boots. 


The Gamma type responses are those in which the subject 
verbally indicates an extension of his own ‘feelings’ and patterns 
of thinking to the person being observed. For example, the 
person may observe in the positive, ‘‘The lady helped the child 
because she wanted her to be happy,”’ or, in the negative, ‘‘ The 
lady did not really want to buy anything; I think she was just up 
there to aggravate the clerks.” 

Both the Alpha and Beta categories were further divided into 
movement and people references. If the description referred to 
people in motion, then the response was placed in the ‘movement’ 
category. For example, “The middle-aged lady helped her 
mother around the furniture department,’’ would be classified 
as positive movement, Alpha, whereas “‘He looked very sad” 
would be classified as negative, people, Beta. 

The above five categories were then divided into a specific 
and a generalized category. The specific category included all 
the statements where the specifically observed person or persons, 
as such, were described rather than generalizing the observation to 
many people. A statement such as “The man was very rude” 
would be put in the ‘specific’ category, whereas ‘“‘ People are very 
rude” was placed in the generalized category. 

Excluding the positive and negative classifications, there were 
ten categories that can be summarized by the following outline: 


Specified Generalized 


A) Alpha 
(1) People x x 
(2) Movement x x 
B) Beta 
(1) People x x 
(2) Movement x x 
C) Gamma x x 


All the above ten categories can be classified as either positive 

or negative. Thus, there were twenty possible classifications. 
For the scoring of the papers a manual was devised that con- 

tained ten samples of each of the above categories. This manual 
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afforded the scorers an opportunity to compare ‘charged’ state- 
ments on a student’s paper with those of the category. This 
procedure was believed to have increased the objectivity of the 
scoring. Where there are twenty possible classifications and the 
definitions of the categories not sharply defined, the question 
arises whether two people scoring the papers independently 
could get approximately the same results. To check the reliabil- 
ity of the scoring, fifty papers were set aside at the beginning of 
the study and scored independently by two of the authors after 
they were familiar with the method. The task for each scorer 
was, first, to choose the statement from the paper that was either 
positive or negative and, then, to place it in the fitting category. 
The results of this independent scoring indicated an agreement of 
seventy-nine per cent. Part of the disagreement was due to the 
mechanical errors of the scorers. Although there was no com- 
plete agreement, the authors felt that the agreement was sub- 
stantial enough to indicate that the method of classifying was 
sufficiently standardized for exploratory work. 

In the scoring, the lines of all the papers were numbered and 
the number of the line containing a charged response was placed 
on a score sheet similar to the one described above. All the 
papers used in this study were scored by two of the authors and 
then those disagreements were judged by all three. 


RESULTS 


Sixty-eight per cent of the responses fell into four of the ten 
categories; namely, in the Alpha and Beta categories which con- 
tained responses referring specifically to people. The People 
Beta category which included attributing qualities to persons 
contained twenty-two per cent of the responses—the largest 
per cent of any of the categories. It is interesting to note that 
the ratio of positive responses to negative responses in the People 
categories was .57, while the comparable Movement ratio was .26. 
These ratios indicate that the subjects tended to respond more 
negatively about a person’s actions than about the person himself. 
Of the 2173 responses scored, 614 were positive and 1559 were 
negative. Thus, 28.2 per cent of the responses were positive 
and 71.7 per cent were negative. This result agrees closely with 
the finding by Markussen! that seventy-two per cent of the 
potentiality statements scored by her were negative. The process 
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of scoring the ‘Life-Situation Technique’ papers for the Teachers 
College Freshman class of 1949-1950 is now being carried out and 
the same negative trend is being found. 

Another interesting find was that fifty-one subjects made com- 
pletely negative responses as compared to only seven who made 
completely positive responses. One hundred ninety-eight made 
the same number or more negative responses than positive and 
twenty-two made more positive responses. 

There were no significant sex differences found in the number 
or type of responses. ‘The critical ratio for the mean differences 
of the total number of charged responses for each sex was found to 
be 1.08; significance is found at 1.97 for the five per cent level of 
confidence. 


A METHOD FOR REPRESENTING THE RESULTS 


In presenting individual results it seems that a profile and a 
score in which the various categories are weighed would be desir- 
able. However, a simple tentative method that might be used 
to indicate relationships of this technique to other criteria has 
been developed. The results of the scoring were quantified 
first by finding the number of positive and then the number of 
negative responses on a subject’s paper. The positive and nega- 
tive scores were combined into one score by subtracting the 
negative from the positive, dividing by the total number of 
words on the paper and then that result was multiplied by 100. 
This gave the number of positive responses that were in excess 
of the negative per one hundred words of narrative. By this 
method the obtained scores ranged from —11.0 to +4.5, with 
the median at —1.45. Many different methods of combining 
the scores were tried but this method seemed to represent best the 
content of the individual papers. 


AN INDICATION OF VALIDITY 


As previously stated, no validation studies have been com- 
pleted to date, but two students in one of the authors’ classes 
displayed extremely negative behavior in the classroom. They 
were negative to ideas, assignments, and to the University in 
general. Their behavior suggested the idea of checking their 
scores on the ‘Life-Situation Technique.’ These two were among 


the fifty-one students who had given complete negative projec- 
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tions. A check with members of the fraternal groups to which 
they belonged brought forth a description of high degree of 
negativity toward house activities and the members in general. 
It would seem that this slight evidence suggests the desirability of 
a careful validation study which is now in progress. 


SUMMARY AND CONCLUSIONS 


In this research two hundred twenty freshmen in the Teachers 
College of the University of Nebraska observed people in a real- 
life situation and wrote up their observations. 

The written observations were scored for positive and negative 
responses. There were twenty different categories in which a 
charged statement could be placed. It was found that these 
charged responses could be selected from the papers by two 
independent judges and placed in the twenty categories with an 
agreement of seventy-nine per cent. 

To determine whether the responses classified as either positive 
or negative in this study could be generally agreed upon, one 
hundred twenty-two statements (sixty positive, and sixty-two 
negative) were sorted by nine independent judges not connected 
with the study. There was an agreement of ninety-seven per 
cent with the authors’ original assumptions. 

Seventy-one and nine-tenths per cent of the 2173 total charged 
responses were negative. On fifty-one of the two hundred twenty 
papers only negative responses were found whereas seven persons 
made only positive. 

The physical descriptions of people as such were more positive 
than the descriptions of the movements of people. This study 
does not indicate whether the subjects describe action more nega- 
tively or whether the action actually is more negative. 

As a result of this study further research is being carried on in 
order to discover the reliability and the validity of the ‘Life- 
Situation Technique.’ The study now being pursued is designed 
to discover whether the frame of reference as measured by the 
‘Life-Situation Technique’ is the same at two different but closely 
consecutive times, and to find whether any relationships exist 
between extreme scores on the ‘Life-Situation Technique’ and 
other aspects of the subject’s life, such as popularity in groups, 
his use of time, his stated goals in life, ability and scholastic 
achievement. 
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Plans are being made to extend this study to other sections 
of the country and to other groups of people. 


REFERENCES 


1) Marilyn Elizabeth Markussen, Projective Techniques, A Possible 
Way of Measuring Individuals’ Reactions in the Area of Human Relations, 
Unpublished Master’s thesis, University of Nebraska, 1948. 

2) Julian B. Rotter and Benjamin Willerman, “The Incomplete 
Sentences Test as a Method of Studying Personality,’’ Journal of Con- 
sulting Psychology, Vol. x1, 1, Jan. Feb., 1947, pp. 43. 

3) Muzafer Sherif and Hadley Cantril, The Psychology of Ego- 
Involvements, New York: John Wiley & Sons, Inc., 1947, chaps. 3 and 4, 
pp. 29ff. 

4) Phyllis Louise Warren, Two Projective Techniques Proposed As A 
Measurement of the Methods Individuals Use in Solving Problems in the 
Area of Human Relations, Unpublished Master’s thesis, 1948, University 
of Nebraska. 


LMIVCMNOM ET UP WHNUTHUNIE LiviWwiitine 








BAIA LIA AAT T TODA Mars 


wry AO 


+8 et ve 





TEACHERS’ ATTITUDES TOWARD THE 
BEHAVIOR PROBLEMS OF CHILDREN* 


JACK NORMAN SPARKS 
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The relative degree of importance teachers assign to behavior 
problems exhibited by children has been a matter of major con- 
cern among educators since the well-known investigation by 
Wickman.'® Since that time his study has been widely inter- 
preted as showing that teachers do not have an adequate under- 
standing of the relative importance of the various behavior 
problems of children. The Wickman study has been criticized 
on various points by Watson,’ Peck,’ and Klein.‘ One of the 
more important criticisms has been that Wickman gave different 
directions to the teachers and mental hygienists used as raters. 
He asked the teachers to rate the problems on the basis of the 
degree to which the presence of a particular behavior character- 
istic made a child a problem in classroom management. In 
other words the directions to the teachers tended to ask for the 
problems which caused the most immediate trouble. Wickman 
also presented the same list of behavior problems to a group of 
mental hygienists and asked them to rate the problems on the 
basis of how seriously the problems were likely to affect the future 
of the child exhibiting them. The ratings of the teachers and 
of the mental hygienists were then compared as if the basis for 
rating had been the same in both cases. Wickman concluded 
that teachers stress the importance of problems relating to sex, 
dishonesty, disobedience, disorderliness, and failure to learn and 
saw little significance in the problems that indicate withdrawing 
characteristics in children. Needless to say, the mental hygien- 
ists considered the unsocial forms of behavior most important 
and attributed little importance to the problems teachers con- 
sidered important. 

Since the time of the Wickman study, several studies have 
been made which substantiated Wickman’s findings. Examples 
of these are the studies by McClure,® Dickson,? Yourman,'' 
Boynton and McGaw,! Laycock,® Peck,’ and Epstein.*® 





* Partial report of a Master’s thesis, done under the direction of Dr. J. B. 
Stroud. 
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The present study represents an attempt to determine whether 
or not the difference in directions given to teachers and to mental 
hygienists in the Wickman study was likely to have influenced 
the relative ratings given by the two groups. One question 
raised, then, was: Do teachers rate the same problems as most 
detrimental to the future of the child exhibiting them that they 
rate as most troublesome in the school situation? Another 
consideration was the question of whether or not the results of 
earlier studies could be applied to the teachers of today. An 
attempt was also made to determine whether or not amount of 
education and years of teaching experience had any relation to 
teachers’ attitudes toward these problems. The educational 
difference was also considered by comparing the ratings of 
teachers with the ratings of a group of graduate students at the 
State University of Iowa. 

The problems the teachers were asked to rate are those used 
by Wickman.'° 


PROCEDURE 


Two forms of a questionnaire were developed, each containing 
the fifty behavior problems from Wickman’s list. Form I was 
developed to sample the attitudes of teachers toward behavior 
problems in relation to the seriousness to the future adjustment 
of the child. Form II represented an attempt to determine 
which problems teachers felt were most troublesome to them. 
Ratings were on a five-point scale, five being highest. 

An attempt was made to secure a representative sample of 
teachers in Iowa by dividing the school systems in Iowa into five 
classes according to size and type of community: 

a) schools in cities of 30,000 and over in population, 

b) schools in cities of from 5,000 to 29,999 in population, 

c) schools in cities of from 2,000 to 4,999 in population, 

d) schools in cities under 2,000 in population, and 

e) rural; and determining the proportion of elementary-school 
teachers in Iowa in each of the five classes of school systems. 
Schools were so selected randomly from each class that the 
total number of elementary teachers in those schools were repre- 
sented in correct proportion to the total number of elementary 
teachers in Iowa. A total of seven hundred sixty-two teachers 
was selected. 
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The schools selected in each class were randomly divided into 
two approximately equal groups. Form I was sent to all the 
teachers in each school in one group; Form II to all the teachers 
in each school in the other group. 

The returns of each form of the questionnaire were tabulated 
on the basis of two divisions in education, a) teachers with less 
than a bachelor’s degree and b) teachers with a bachelor’s degree 
or more. Three divisions according to amount of teaching 
experience were also established: a) teachers with less than five 
years of experience, b) teachers with five to ten years of expe- 
rience, and c) teachers with more than ten years of experience. 

A group of graduate students at the State University of Iowa 
also rated the questionnaire. The group consisted of sixty-one 
students taking a course in educational psychology. Half of 
this group rated Form I and the others rated Form II. 

The instructions given on each of the two forms of the ques- 
tionnaire are shown below: 


Form I 


The behavior problems tabulated below are ones which 
teachers daily see exhibited by pupils. You are asked to 
rate these behavior problems on the basis of how seriously 
you think they are likely to affect the future of the child 
who exhibits them. Rate the problems in terms of a five- 
point scale. To the left of each item are the numbers 1-5 
inclusive. Encircle the number “‘5” for the most serious 
problems and the number ‘‘1”’ for the least serious problems. 
Rate those falling between in accordance with the degree of 
seriousness involved. 


Form II 


The behavior problems tabulated below are ones which 
teachers face daily in school situations. You are asked to 
rate these behavior problems on the basis of how much 
trouble they cause teachers in coping with them in school 
situations. Rate the problems in terms of a five point scale. 
To the left of each item are the numbers 1-5 inclusive. 
Encircle the number “5” for the problems that cause 
teachers the most trouble day in and day out, and the 
number “1” for the least troublesome problems. Rate 
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those falling between in accordance with the degree to which 
they are troublesome. 


The total number of returns from teachers was three hundred 
eighty-five, one hundred ninety returns of Form I and one 
hundred ninety-five returns of Form II. 


RESULTS 


Mean ratings and standard deviations of ratings were cal- 
culated for Form I and Form II main groups and for the various 
sub-groups. 

The data were treated by two methods. In the first, the 
behavior problems were arranged in rank order for the Form I 
and Form II groups, the problem receiving the highest rating 
placed first, ete. Rank difference correlation coefficients were 
then calculated between the rank arrangements of the ratings 
by the groups which were to be compared. In the second 
method, the mean rating of each behavior problem was computed 
for the various groups and a test of significance of the difference 
between mean ratings of the groups was applied. 

The first comparison was between the ratings of all the teachers 
who rated Form I (seriousness to the future adjustment of the 
child) and the ratings of all the teachers who rated Form II 
(troublesomeness to the teacher). The rank difference correla- 
tion obtained between the rank arrangements of the ratings of 
the problems by these two groups was .05. This coefficient is 
not significant.* Thus, the teachers rated the problems differ- 
ently under the two sets of instructions. The mean ratings on 
Form I averaged .78 higher than the means of the ratings on 
Form II. Evidently the teachers considered the problems in 
general to be more serious than troublesome. 

Examination of the rank order of the ratings of the traits by 
various groups revealed certain interesting trends. Presented 
below are the ten problems rated highest by a) the total group of 
teachers who rated Form I and b) the total group of teachers 
who rated Form II. To the right of each trait is shown the rank 
it received on the other form of the questionnaire: 





*G. R. Thornton, ‘‘The Significance of Rank Difference Coefficients of 
Correlation,’’ Psychometrika, 8, 1943, p. 211. 
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Teachers Rating Form I Form II Rank 
(Seriousness to Child) 
1. Stealing 40 
2. Untruthfulness 24 
3. Unreliableness 14 
4. Cruelty and bullying 30 
5. Cheating 17 
6. Heterosexual activity 46 
7. Impertinence 28 
8. Impudence 13 
9. Selfishness 12 
10. Laziness 11 
Teachers Rating Form II Form I Rank 
(Troublesomeness to Teacher) 
1. Interrupting 41 
2. Carelessness in work 17 
3. Inattention 31 
4. Restlessness 45 
5. Silliness, ‘smartness,’ etc. 39 
6. Whispering and note-writing 48 
7. Tattling 36 
8. Thoughtlessness 35 
9. Disorderliness 26 
10. Inquisitiveness 47 


It is immediately evident that the problems that the teachers 
considered to be most detrimental to the future adjustment of 
the child were the types of behavior associated with violations 
of our social and moral code, that is, those things which we con- 
stantly say that “‘good boys and girls do not do.” 

The first ten problems in rank order of seriousness as rated by 
Wickman’s mental hygienists are (numbers in parentheses show 
rank order of ratings of these same traits by Form I group): 1) 
Unsocial, withdrawing (21), 2) Suspiciousness (40), 3) Unhappy, 
depressed (12), 4) Resentfulness (24), 5) Fearfulness (33), 6) 
Cruelty and bullying (4), 7) Easily discouraged (11), 8) Suggesti- 
ble (19), 9) Overcritical of others (25), 10) Sensitiveness (43). 

Problems rated 41-50 by the Form I group follow (Rank 
order assigned these problems by the mental hygienists are in 
parentheses): 41. Interrupting (48), 42. Shyness, bashfulness 
(14), 43. Sensitiveness (10), 44. Dreaminess (18), 45. Rest- 
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lessness, (40) 46. Enuresis (27), 47. Inquisitiveness (44), 48. 
Whispering and note-writing (50), 49. Tardiness (43), 50. 
Imaginative lying (33). 

While teachers who were asked to rate problems in terms of 
seriousness to the child’s future adjustment rated them differ- 
ently from teachers who were asked to rate problems in terms of 
troublesomeness, they still did not rate these problems in the 
same way the mental hygienists did. Apparently the teachers 
were not thinking in terms of psychological categories. Those 
problems which the teachers considered to be most troublesome, 
as shown by those who rated the problems under Form II instruc- 
tions, are representative of behavior which frustrates the teacher 
in trying to educate children. Included are behavior which 
disturbs the classroom and behavior relating to lack of applica- 
tion to school work. 

Incidentally, the group of graduate students’ ratings on 
seriousness to the child resembled quite closely those of Wick- 
man’s mental hygienists. 

Examination of the ratings by the Form I and Form II groups 
showed that neither amount of formal education nor years of 
experience made a great deal of difference in the way the problems 
were rated. The correlations between rank arrangements of 
mean ratings of problems by the various subgroups on Form I 
and on Form II were uniformly low, ranging from —.11 to .06. 
On the other hand, the rank difference correlation between the 
rank arrangements of the ratings of the behavior problems by 
teachers with less than a bachelor’s degree and the ratings of the 
problems by the teachers with a bachelor’s degree or more was 
.87. Boynton’s table! shows a rank difference coefficient of 
such a size to be very significant. This shows that teachers with 
less than a bachelor’s degree tended to rate the problems in the 
same way as those with a bachelor’s degree or better. However, 
the teachers with a degree or more rated eight problems signifi- 
cantly more serious than did the teachers with less than a 
degree. These problems were certain of the problems describing 
withdrawing and recessive traits, and certain of the immoralities 
and dishonesties (stealing, untruthfulness, and heterosexual 
activity). None of the problems were rated significantly more 
serious by the teachers with less than a degree than by the 
teachers with a degree or more. 
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Comparison of the ratings of the various groups who rated 
Form II (troublesomeness to the teacher) failed to reveal any 
systematic differences in the relative degree of troublesomeness 
assigned to the behavior problems by teachers with varying 
degrees of experience and education. 


CONCLUSIONS 


The following conclusions seem to be substantiated by the 
data: 

1) The teachers in this sample who were instructed to rate 
the behavior problems in terms of seriousness to future adjust- 
ment of children displaying them did not rate them the same way 
as did those who were asked to rate them on troublesomeness in 
classroom situations. However, neither group rated them very 
well from a psychological standpoint. 

2) Amount of education seems to make some difference in the 
attitudes of teachers toward the seriousness of certain behavior 
problems in relation to the future of the child. Teachers with 
varying amounts of experience differ little in their attitudes 
toward the seriousness of behavior problems to the future adjust- 
ment of the child exhibiting the problems. 

3) The fact that the teachers rated honesty, social morality, 
and sexual morality high would seem to indicate that those traits 
of virtue which our society has always been most concerned with 
are more important to teachers than the personality traits which 
indicate the state of a child’s personal adjustment. 
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AN EXPERIMENTAL COMPARISON OF TEST- 
RETEST AND INTERNAL CONSISTENCY 
ESTIMATES OF RELIABILITY WITH 
SPEEDED TESTS* 


ALEXANDER G. WESMAN 
The Psychological Corporation 


JOHN P. KERNAN 


Dunlap and Associates, Inc. 


It has been pointed out by a number of writers, including 
Cronbach (2), McNemar (5), Thorndike (6), and others that the 
use of internal consistency methods in the estimation of the 
reliability of speeded tests results in spuriously high coefficients. 
Nevertheless, test builders persist in utilizing internal consistency 
formulae with such tests and, indeed, frequently compound the 
error by claiming that the estimates are unduly low.t There are 
relatively few demonstrations in the literature comparing the 
coefficients resulting from internal consistency methods with 
those obtained by test-retest techniques. Bennett, Seashore, and 
Wesman present the results of one such experiment in the Dif- 
ferential Aptitude Tests Manual (/) which reports split-half and 
test-retest coefficients for a simple clerical test; the split-half 
coefficients are shown to be consistently (and sometimes dra- 
matically) higher. 

The spurious character of internal consistency coefficients is 
obviously in part a function of the degree to which the tests are 
speeded—that is, the extent to which subjects who could have 
correctly answered later items in the tests failed to reach those 
items. If the items are all so easy that everyone could correctly 
answer them if given unlimited time, the internal consistency 
coefficient under speed conditions should approach 1.00 regard- 
less of the size of the test-retest reliability. If, on the other 





* This paper utilizes some of the findings from Kernan, J. P., An Empirical 
Determination of Test Reliability by Different Experimental Designs, Fordham 
University, New York, 1950, an M.A. dissertation done under the mentor- 
ship of Wm. J. E. Crissy. 

¢ Such claims apparently stem from a statement made by Kuder and 
Richardson that violation of any assumptions underlying their Method of 
Rational Equivalence would yield underestimates of reliability (4). 
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hand, the items are heterogeneous in difficulty for the group 
and are arranged in order of increasing difficulty, the spurious- 
ness of the internal consistency coefficient will not be so great— 
depending in part on the extent to which speed and power are 
intercorrelated. 

The correlation between speed and power probably varies from 
one content area (or one skill) to another. It is the writers’ 
belief that a steeply graded test in mathematics, for example, will 
show very high correlation between speed and power, whereas in 
English literature or social studies, the correlation between speed 
and power is not likely to be so high. Mathematics as a disci- 
pline has a vertical hierarchy of knowledge; the student who can 
solve problems in trigonometry can almost certainly solve those 
in elementary algebra and is likely to arrive at the answers to the 
lower level problems quickly. There is no such assurance that a 
student who knows Eighteenth Century literature is equally con- 
versant with modern drama or oriental literature; the student 
may puzzle for a long time over a question which has little diffi- 
culty for the group asa whole. These hypotheses are the product 
of the senior writer’s experience in building tests, rather than 
being based on clear-cut experimental evidence. 

It seems evident, then, that a number of empirical comparisons 
of test-retest and internal consistency coefficients are needed. 
We need to know more about the extent of artificial inflation of 
the reliability coefficient under varying conditions. How great 
is the overestimate if ninety per cent finish the test—or eighty 
per cent, or seventy per cent? What assumptions can we make 
with tests in Biology—in Reading Comprehension—in Logical 
Reasoning or Productive Thinking? How do split-half coeffi- 
cients compare with methods of rational equivalence in each of 
these circumstances? The present study is intended to con- 
tribute data which, when coérdinated with the results of many 
similar experiments, will shed some light on these topics. 

The test used was The Psychological Corporation’s General 
Clerical Test (3). It consists of nine parts which yield three sub- 
total scores and a total score. The parts are: Checking and 
Alphabetizing (Clerical subscore); Arithmetic Computation, 
Error Location, and Arithmetic Reasoning (Numerical sub- 
score); and Spelling, Reading Comprehension, Vocabulary and 
Grammar (Verbal subscore). These parts are all more or less 
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speeded; they vary also in the homogeneity of the difficulty levels 
of the items. The test was administered to 197 twelfth-grade 
commercial high school students, a month intervening between 
testings.* Thirty-nine of the students were boys, 158 girls; they 
ranged in age from sixteen to nineteen years. 


TABLE I.—TEST-RETEST RELIABILITY COEFFICIENTS ON THE GCT 
SCORES AND THE MEANS AND SD’s oF SCORES ON THE 
First AND SECOND TESTINGS 

















ist Testing | 2nd Testing 

Max. 

Test Parts Poss. | r 

Score Mean| SD | Mean} SD 
Part I 19 |.59| 7.70) 2.63) 9.69) 3.28 
Part II 61 |.87| 29.05) 9.07) 35.18) 9.43 
Clerical Subscore 80 |.87| 36.75,10.57| 44.87)11.38 
Part III 20 |.67| 11.55) 2.66) 12.88) 2.82 
Part IV 20 |.67| 11.17) 4.91] 14.64) 4.44 
Part V 16 |.76| 5.64) 2.56) 6.40) 2.93 
Numerical Subscore 56 |.82) 28.35) 8.16) 33.92) 8.36 
Part VI 29 |.88) 19.72) 5.42) 20.44) 5.28 
Part VII 14 |.65) 7.44) 2.85) 9.23) 2.59 
Part VIII 40 |.86) 19.15) 6.04) 20.96) 5.73 
Part IX 24 |.68| 9.99) 3.32) 11.20) 3.25 
Verbal Subscore 107 |.91) 56.3013.08) 61.83)13.08 
Total Score 243 |.94/121.41/25.42)140.62)'26.33 

















Table I presents the test-retest coefficients for each of the nine 
parts, the three subscores, and the total score, together with the 
respective means and standard deviations. The consistent 
increase in score on the second testing is to be expected; since the 
same form of the test was used, practice effect may be present, 
and learning also might well have occurred during the period 
between testings. Table II presents these same coefficients 





* We are grateful to Dr. Robert E. Carey, Director of Guidance, Yonkers, 
New York, for his coéperation in making these students available to us. 
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together with three sets of internal consistency estimates: split- 
half and Kuder-Richardson formulas 20 and 21.* 

A number of interesting observations may be made with regard 
to the data in Table II. One generalization is that Kuder- 
Richardson formulas do not always underestimate. In all but 
one instance the reliability estimates obtained by formula 20 are 
larger than the test-retest coefficient (though not always sig- 
nificantly so). K-R No. 21, on the other hand, exceeds the test- 
retest coefficient in only one instance (Part IV-Error Location). 
We may note also that in four instances (Parts I, III, V, and IX) 
K-R No. 21 is too conservative; it underestimates reliability to 
an extent which would cause important differences in judgment 
about the test part. 

The split-half estimate also exceeds the test-retest coefficient 
for most of the subtests, sometimes by more than K-R No. 20, 
sometimes by less. Just as with K-R No. 20, the split-half esti- 
mates are sometimes quite close approximations to the test-retest 
coefficients, but sometimes they are gross exaggerations. The 
fact that in all but one comparison (for Part IX) the difference is 
no greater than .03, demonstrates the closeness of the split-half 
and K-R No. 20 estimates. 

Several points seem to be suggested by the data: 

1) K-R No. 20 and split-half estimates resemble each other 
very closely, at least with these test parts. 

2) It has once again been demonstrated that the K-R for- 
mulas do not always underestimate the ‘true’ reliability. They 
may, in fact, yie!d gross overestimates. 

3) K-R No. 21 generally provides an underestimate of reliabil- 
ity, which is usually assumed good since the error is on the side 
of conservatism. Practically, however, the use of this formula 
may be quite dangerous, since the underestimate may be so great 
as to deceive the test constructor with regard to appropriate 
action. For example, the test constructor might consider the 





* The Formulas used were: 
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TABLE IJ.—TEST-RETEST, SPLIT-HALF, K-R No. 20 ann K-R 
No. 21 RELIABILITY COEFFICIENTS 


Test- Split- K-R K-R 
Test Parts retest half No. 20 No. 21 
Part I .59 71 71 .36 
Part II .87 .99 .96 .83 
C. Subscore .87 .97 .95 .83 
Part III .67 .69 . 69 .33 
Part IV . 67 91 .92 .84 
Part V .76 .68 71 47 
N. Subscore .82 .89 .90 81 
Part VI .88 .86 .89 81 
Part VII .65 .83 81 .62 
Part VIII .86 87 .88 .48 
Part IX .68 .57 .69 47 
V. Subscore 91 91 .92 .85 
Total Score .94 .96 .96 91 


development of Part III to a point at which its reliability was 
.80. If he had before him only the estimate provided by K-R 
No. 21, he would be led to believe that a test part eight times 
the length of the present Part III was needed, and he might well 
abandon the project.* Yet the evidence from the test-retest 
coefficient is that the part would need to be only twice its present 
length to attain a reliability of .80. Similarly, for Part V, the 
K-R No. 21 estimates indicate that a part ten times as long 
would be required to reach .90; the test-retest coefficient suggests 
that to reach .90, the part would have to be only three times as 
long as at present. The practical decisions based on these meth- 
ods may well be quite opposite to each other. 

4) Inspection of the test content in this study yields no clear 
insight with regard to the effect of the materials on the spurious 
results of internal consistency techniques. Part I appears to be 
most homogeneous in content and range of difficulty; it is proba- 
bly the ‘purest’ of the nine parts as a speed test. K-R and split- 
half methods overestimate the test-retest coefficient by about 





* The familiar Spearman-Brown formula for estimating the increased 
length of test required to achieve a desired coefficient is: 
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.12; K-R No. 21 underestimates by .23. Part II, which is almost 
as simple a task (Part I requires comparing names and numbers; 
Part II requires a knowledge of the alphabet) and seemingly 
equally homogeneous in content and range of difficulty is over- 
estimated by split-half and K-R No. 20 by about .10; with this 
part, however, K-R No. 21 underestimates by only .04. Part 
III, an arithmetic computation test, shows essentially the same 
coefficient for K-R No. 20, split-half and test-retest, while K-R 
No. 21 underestimates appreciably; Part IV, an arithmetic error 
location test, is overestimated by split-half, K-R No. 20 and 
K-R No. 21 estimates. 

It will have been obvious to the reader that our treatment has 
considered the test-retest estimate of reliability as the criterion 
against which the spuriousness of internal consistency coefficients 
has been judged. This procedure is in general accord with pres- 
ent measurement theory; it should nonetheless be recognized 
that our criterion itself is not perfect, even with these tests and 
this population. Since the same form of the test was used twice, 
there may have been enough memory of previous response to add 
a touch of spuriousness to the test-retest coefficient. On the 
other hand, an interval of a month might quite possibly be long 
enough for differential growth to have taken place in some of the 
abilities measured, which would depress the coefficient. 

It is the writers’ belief that neither of these factors was seriously 
operative in the experimental situation. But the possibilities are 
at least real enough to lead to the question as to how much time 
ought to elapse in order to yield the best estimate of reliability 
for each kind of test. This question leads naturally to a con- 
sideration of what aspect of reliability concerns us. If we are 
concerned with what Cronbach (2) has called the ‘coefficient of 
equivalence,’ i.e., how precisely the test measures the person’s 
performance at the particular moment, our best estimate for 
speeded tests would be obtained only by having two forms 
administered in a single day. When only one form is available 
this is impossible; and administration of the same form twice in 
the same day would, except in the rarest of instances, result in 
an ambiguous reliability coefficient. How much time, then, 
should elapse? Only a general answer is possible: enough time 
to eliminate the influence of memory and immediate practice 
effect, and too little for changes in the measured traits to occur. 
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A series of studies investigating this principle with specific tasks 
and specific groups could contribute valuably to our understand- 
ing of reliability. Until enough such investigations have been 
conducted, we shall have to continue to rely on our experience and 
our intuitions. 

Spurious coefficients also deserve empirical study. There has 
been considerable progress recently in the theoretical develop- 
ment of estimates of spuriousness present in internal consistency 
coefficients derived from speeded tests. Unfortunately, all too 
often the assumptions which are required for the theoretical 
developments are hard to accept. Until such time as there is 
general agreement as to the acceptability of measures of spurious- 
ness, empirical findings such as those reported in this paper can 
provide useful clues for the everyday practitioner. 
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THE FLEXIBILITY OF READING RATE 


LAWRENCE W. CARRILLO and WILLIAM D. SHELDON 
Reading Laboratory, Syracuse University 


In recent years much emphasis has been placed upon increasing 
the speed of reading in various reading clinics throughout the 
country. The writers’ experience has shown that speed is one 
of the aspects of reading in which students feel most limited. In 
some cases, however, the tendency in teaching would seem to be 
toward having the student cover the printed page as rapidly as 
possible, with little attention to the meaning of the material. 
This approach is, of course, contrary to all reason. The aim of 
reading should be to further understanding, and the amount of 
time required for this understanding should vary with the pur- 
pose of the reader and the difficulty level of the material being 
read. 

It is suspected that much of the misdirection of attack on the 
problem of increasing reading speed is due to the teacher inter- 
pretation of reading rate scores on standardized tests of reading. 
The teacher accepts the standardized test speed score as a true 
appraisal of the reading speed of the student. Accepting this, 
he proceeds to teach toward improving ‘speed of reading’ as 
exemplified by this test. This is an acceptable procedure only 
if the speed of reading test used is constructed on the basis of the 
best reading theory. Most of these tests, at present, would seem 
to fall far short of accepted reading theory. 

It may also be possible that the emphasis on experiments 
concerned with the development and nature of speed of reading 
for students at the college level has had some unexpected effects 
on the teaching of reading. The experiments concerned with 
the relationship between speed and comprehension have caused 
some misinterpretation. This has resulted in teaching for speed 
only with the idea that comprehension will increase with the 
increase in speed. 

The purpose here, then, is to present the following: 1) The 
accepted point of view regarding rate of reading, 2) A short dis- 
cussion of research findings concerned with the relationship 
between rate and comprehension, 3) A discussion of the approach 
used in standardized tests to determine rate, 4) The writers’ 
suggestions for a design for a more adequate testing instrument, 
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and 5) Suggestions for teachers until such an instrument is 
developed. 


THE ACCEPTED POINT OF VIEW 


In professional books, intended for teachers of reading, the 
following quotations appear to be typical: “‘ Varying the rate of 
reading and the skills employed is an important achievement 
and, therefore, facility in this respect should be appraised .. . 
Rate is influenced not only by the level of readability of the read- 
ing matter, but also by the familiarity of the content, by the types 
of reading ranging from the cursory to the intensive, by interest, 
and by other factors.” ? (p. 461) 

‘‘A rate of reading which is ideal for some purposes may be 
inappropriate for others. Some types of reading matter should 
be read quickly, while others need to be read deliberately and 
carefully.” 7 (p. 447) 

‘Speed in itself has no value. It has no worth when divorced 
from understanding. Speed should be thought of and taught as 
‘speed of understanding adequately.’ Every pupil should learn 
to adjust his speed of reading in a given situation to the purpose 
for which he is reading and to the difficulty of the reading matter 
which he has at hand. . . He should have several speeds, each 
to be used, as needed, within the limits of adequate understand- 
ing.” * (p. 110) 

Other authors express similar ideas. The basic principles 
voiced by all might be summarized as follows: The mature reader 
is the adaptable, versatile reader; he should be able to adapt 
his rate of reading to the purpose with which he approaches 
the printed page, and to the difficulty level of the material. 
The goal is understanding at an adequate level. 


READING SPEED AND ITS RELATIONSHIP TO COMPREHENSION 


In the periodical literature a wealth of research on reading 
speed and its relationship to comprehension has been reported. 
The results of these studies are extremely varied. In one article 
the range of correlations reported was from —.47 to .92.3 Most 
of the correlations are positive but low, averaging .30.5 

Recent studies have become quite critical of the techniques 
used in earlier investigations.'!** It is usually pointed out 
that the various instruments used to measure reading speed have 
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limitations, and that dissimilar materials presented do not yield 
comparative scores. 

When independent measurements of rate and comprehension 
based on the same passages or comparable passages have been 
made, the correlations have tended to be significantly positive, 
but low.*:§ 

Experiments have shown that increasing the speed of reading 
will most likely result in a decrease in comprehension.'!®® <A 
significant conclusion of one study* was that students who com- 
prehended well adjust rate by slowing down as materials increase 
in difficulty, while those who comprehend poorly apparently 
read easy or difficult material at much the same rate. This is 
supported by another study*® in which correlations of equal 
magnitude (.29 and .31) were found by separating easy and 
difficult items and correlating rate and comprehension on each. 
Investigation has also shown a negative relationship between 
speed and comprehension in certain fields, especially in mathe- 
matics and science.' 

The writers would interpret the varied results from these and 
other studies in the following manner: 

1) Insofar as the approach to reading is flexible, rate and com- 
prehension will vary together. That is, if the reader has difficulty 
in comprehending, he will (if he is an adequate reader) slow down 
to understand. Reading rate should vary as the result of varia- 
tions in the comprehending functions. 

2) One of the factors in all these studies which might cause 
such confusion is that there are many students, even at the college 
level, who read with what amounts to one inflexible rate. These 
students would tend to lower all correlations between rate and 
comprehension, since they do not vary. This would be true, 
even if rate and comprehension are measured separately on the 
same passages. 

3) Considering the two above conclusions, it is possible that 
the low positive correlations arrived at in most studies represent 
the normal amount of flexibility in the populations sampled. 

4) Additional studies concerning the degree of correlation 
when reading for different purposes, and when reading different 
types of material at varying levels of difficulty would be enlighten- 
ing. Until this is done, any over-all interpretation of results 
found thus far can be mere speculation. 








BAI LIAAAT F LOD AMES 


+ am 


. 


ie oe a 


*_+e 








302 The Journal of Educational Psychology 


STANDARDIZED TESTS GIVING SPEED SCORES 


A number of tests have been published which attempt to 
measure speed of reading.‘ However, if one remains cognizant 
of the principle of flexibility or versatility of rate of reading com- 
prehension, these tests would seem to fall short of their objectives. 

In several of these tests, a proof-reading approach is used. 
That is, the student crosses out the word in the short selection 
that does not fit with the rest. In many of the tests for speed, 
the time limit is very short. In others, the speed of non-com- 
prehending (comprehension questions missed) is included in the 
speed of comprehension score. In many of the tests, a purpose 
is not clearly stated for the reading—that is, the reader does not 
understand clearly at the beginning what type of questions he 
will be required to answer at the end. And in none of these 
tests is there any provision for a flexibility score, derived from the 
student’s differing approach to differing material or purpose. 

It is felt that this is superficial testing of a rather complex skill, 
and that this approach has much to do with the methods of 
teaching this skill. If the results of standardized tests are to 
be used as an aid in diagnosis, the limitations of these instruments 
should be understood by all concerned. A test in this area 
should show regard for: 1) The basic idea that reading is not 
reading unless understanding results, 2) Testing of rate should 
be done with an instrument providing for a usual and natural 
reading situation, 3) Purpose must be clearly established, and 4) 
Provision must be made for a measurement of the flexibility of 
the testee’s approach to different reading situations. 


DISCUSSION 


A major problem, then, seems to be that we have no instru- 
ment suitable to check our objective of developing flexibility, 
and therefore have a tendency to ignore this phase of reading 
instruction. Actually, however, this is the crux of the problem 
of reading rate. 

Such types of practice or teaching as speed drills, short exposure 
devices, training in rapid reading, and provision for reading of 
much simple material are highly questionable. That is, they are 
questionable unless the student is also aided in or is able to 
develop for himself some versatility of approach. The provision 
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of much simple material seems especially dangerous, since this 
is not a normal academic situation, and may lead to careless 
study. Again, there should be some way of checking upon the 
development of versatility or flexibility, otherwise actual damage 
may be done, even though improvement is found on a post- 
test of reading speed of the presently available type. 


SUGGESTED DESIGN FOR A TESTING INSTRUMENT 


An attempt will be made here to present a tentative design 
for a test of reading flexibility. It is realized that suggestions 
of this type are merely a first step in development, and that the 
succeeding steps are fraught with difficulty. However, the 
need seems great enough so that at least an attempt should be 
made. The suggestions here are not intended to be inclusive, 
but it is hoped that they may point out, at least partially, the 
design. 

1) The test will, of necessity, be a much longer test than 
most speed-of-reading tests. This is essential for reliability, 
especially considering the variability of material that should be 
presented. It is also important in providing the normal reading 
situation. 

2) Each exercise provided should be straight reading of the 
type found in normal reading, though variation should be pro- 
vided in the way of subject matter and typographical cues. The 
exercises should not be merely two or three sentences on a single 
paragraph. They should probably be of at least four hundred 
words in order that individual differences will show up clearly 
in the timing. 

3) At the beginning of each selection, a purpose in the reading 
of that selection should be established. This, it is realized, 
cannot be done completely, since each reader will have his own 
individual purpose which cannot be controlled. But at least 
he should be given something definite to answer for at the end 
of the reading. Probably this purpose should remain as constant 
as possible at all levels of difficulty. 

4) The difficulty level of each selection should be established. 
This may be done fairly well by several methods now in use. 
Several selections in various fields should be given at each 
difficulty level, since the student should not be expected to per- 
form at the same level in all areas. If the test is designed for 
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college students, two or three selections at sixth-grade level 
would be a good starting point; then a gradually increasing 
variability of selections at the eighth-, tenth- and twelfth-grade 
and college level should be employed. At the college level there 
may still be an increase of difficulty. 

5) Each section should be timed separately. This may be 
accomplished in a group as well as individually without too much 
difficulty to either the students or the administrator of the 
instrument. Timing by ten-second intervals would probably 
be close enough for all practical purposes. 

6) The scores obtained would be more or less on this order: 

(a) Frustration level of the student in each subject area—the 
level of difficulty at which correct answers fell below fifty per 
cent (possibly). (b) Rate of comprehension at each difficulty 
level completed accurately (at least seventy-five per cent). 
(c) An index of the flexibility or versatility of the reader—how 
much he slowed down as the material increased in difficulty. 
(d) If comprehension purpose remains the same throughout, a 
score of comprehension level for that purpose or purposes, as 
compared to norms. (e) An analysis of the type of compre- 
hension questions most often and least often answered correctly— 
what was the type of question that caused frustration at any 
level, or what was instrumental in setting the frustration level? 

This is quite an imposing order. However, an instrument in 
this direction would seem to be an improvement over the present 
situation. 


SUGGESTIONS FOR TEACHERS 


Teachers should be able to substitute for the above outlined 
instrument. Informal testing is possible—more than that, 
desirable. Even in elementary and junior high schools the 
concept of flexibility must be brought home to the students. 

First and foremost, all teachers should be conscious of the 
flexibility of reading rate. Assignments should be made in a 
way that clearly indicates the kinds of information wanted. 
Students should be allowed to discuss ways in which the material 
must be read to achieve the desired effect. Varied materials 
should be used, even if only in one subject-matter area, and 
care should be taken to point out the variability of proper attack 
necessary for different authors. Supplemental reading at vary- 
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ing levels of difficulty and with varying purposes should be 
encouraged and specific instruction given in reading with these 
differences in mind. The students should understand what 
makes some materials more difficult to understand than others 
and why they should vary their attack. If speed drills are given, 
the students should know why the material being used for this 
may be read rapidly and still comprehended. The emphasis 
should always be upon reading to understand rather than 
hurried reading without meaning. 
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A NOTE ON THE RELATIONSHIP OF HOSTILITY 
AND SOCIAL DISTANCE 


HARRY A. GRACE 


University of Illinois 


During the past few years investigations have been carried on 
by the author!** concerning the expression of hostility. These 
studies have been conducted by means of a hostility inventory.® 
Recently, a study was reported on the development and use of the 
Geo-Ethnic Preference Inventory* which is a study of social 
distance. This paper reports the results of a statistical analysis 
of the relationships between scores on these two inventories. 

The hypothesis under which this study was conducted is as 
follows: If prejudice has as two of its component parts hostility 
and social distance, then the expression of hostility by an individ- 
ual will be related to his perception of social distance. 


METHOD AND PROCEDURE 


The hostility inventory was administered to one hundred 
thirty-five undergraduate students at this University in the spring 
of 1949, and followed one month later by the Geo-Ethnic Prefer- 
ence Inventory. 

Table I represents the intercorrelations between each of the 
subtests on the hostility inventory and each of the subtests on the 


TABLE I.—TABLE OF INTERCORRELATIONS BETWEEN SUBTEST 
SCORES ON INVENTORIES OF HOSTILITY AND SocIAL DISTANCE 


EA* EL EV ED SA SL SV 8D IA IL IV ID 
At +.11 +.02 +.13 —.01 +.22 +.15 —.04 —.07 —.06 +.17 +.18 —.03 
B —-.01 —.00 —.05 +.03 +.08 —.13 +.08 +.03 —.10 —.03 +.13 +.19 
Cc +.18 +.14 —.21 —.17 —.02 +.18 —.20 —.02 .00 +.06 .00 —.06 
D —.11 +.04 +.17 —.22 —.04 —.05 +.13 —.06 +.05 —.10 +.07 —.01 
E +.01 —.13 +.01 —.03 +.02 +.15 —.20 —.01 —.17 +.08 +.14 —.03 
F —.14 —.13 +.04 +.21 —.01 —.17 +.22 +.12 —.10 —.06 +.08 +.20 
G —.02 +.01 —.03 +.06 —.09 +.10 —.04 .00 +.05 +.06 +.16 +.03 
H —.07 —.05 +.03 +.15 .00 —.19 +.18 +.16 +.14 —.20 .00 +.07 
I — .005 +.06 —.01 —.09 +.02 —.01 +.08 —.02 +.20 +.01 —.19 —.18 
J —.02 +.03 +.10 +.04 +.06 +.09 —.08 —.14 +.04 +.05 —.10 +.02 


* E, everyday situations; 8, student situations; I, international situations; A, autohostile 
(self-hostile); L, Laissez faire (no hostility expressed); V, verbal-heterohostile (verbally 
hostile to others); D, direct-heterohostile (directly hostile to others). 

+ A, Anglo-Saxon culture (United States, British Commonwealth); B, Hindu Southeastern 
Asian; C, Semitic, Arabic, Hebraic; D, Latin American; E, Latin European; F, Negro African; 
G, German, Scandinavian; H, Sinic, Japanese; I, Slavic, Eastern European; J, Imaginary. 
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GEPI. None of the absolute values of the correlations exceed 
.22. The correlations range from —.22 for the relationship 
between the Latin American culture and everyday direct- 
heterohostility, and +.22 for the relationship between the Anglo- 
Saxon culture and autohostility in student situations, and African 
culture and student verbal heterohostility. 


DISCUSSION 


With reference to the hypothesis under test, the data do not 
support the assumption that there is a relationship between 
hostility and social distance as measured by these two inventories. 
Methodologically, this may be an important finding, for it 
indicates that these two tests tend to be independent of each 
other, and may be used as part of a battery of such tests. One 
reason which might account for the absence of significantly high 
correlation coefficients between subtests of these two inventories 
could be the presence of factors in each of the separate tests. 
Future tests formed around the factors in each of these inventories 
may indicate a more definite relationship between social distance 
and the expression of hostility. Finally, these two variables, 
hostility and social distance, may be related in a more complex 
manner than has been assumed to them heretofore. Research is 
currently being directed toward a more definitive analysis of the 
relationship between these two variables. 


SUMMARY AND CONCLUSIONS 


1) Two paper and pencil inventories were administered to 
undergraduate college students in the spring of 1949. The first 
was a hostility inventory and the second a measure of social 
distance. 

2) The subtest scores on each of the inventories were inter- 
correlated by means of the Pearson product-moment method. 

3) The intercorrelations indicated a range running from —.22 
to +.22. The low absolute value of such correlations suggests 
the absence of relationship between these two variables. 

4) Suggestions for interpretation of these results are given. 
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A STUDY OF THE RELATIONSHIP BETWEEN 
FIGURAL AFTER-EFFECT AND READING-TEST 
PERFORMANCE 


CECIL MAX FREEBURNE* 
State University, Bowling Green, Ohio 


Gibson (3,4,5,6) has demonstrated that prolonged exposure 
to a bent or curved line is followed by perception of an objectively 
straight line as being bent or curved in the opposite direction. 
Bales and Follansbee (1) report in addition that the reading of 
news-clippings during the interval between exposures of the two 
figures is associated with a greater degree of this effect than is the 
fixation of a dot on a white ground. They offered no explanation 
of this finding, but suggested that the nature of these after- 
effects is partly determined by the character of eye movements. 
Kohler and Wallach (7) called this phenomenon ‘figural after- 
effect’ and observed that it occurred under various arrangements 
of a number of geometric figures of several sizes and shapes. 
They offered an explanation of these after-effects in terms of 
satiation of portions of the cerebral cortex corresponding to 
stimulated areas of the retina. They suggested that all persons 
are at all times subject to some degree of satiation in the visual 
field, and that the degree of satiation and of susceptibility to 
satiation differ from person to person. 

The process of reading involves eye movements and short 
fixation periods. If the above-noted suggestions of Bales and 
Follansbee and of Kohler and Wallach are sound, then it appears 
that there might be some relationship between the incidence of 
these after-effects and proficiency in reading. 

It is the purpose of the present study to provide some infor- 
mation concerning the relationship between reading-test per- 
formance and the occurrence of figural after-effect. 


PROCEDURE 


Twenty-four Introductory Psychology students were given the 
Iowa Silent Reading Test, Advanced Form Cm. Subsequently, 





* The writer wishes to thank Mr. Robert B. Chotoff, who drew the 
figures used in the present study, and Mr. John E. Taylor, who assisted in 
the collection of the data. 
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by means of a special tachistoscopic exposure device to be 
described elsewhere (2), the subjects were presented with 
Gibson’s straight-line and curved-line figures as practice figures, 
and with Figures 2, 5, 11, 13, 14, 21, 24, 30, 31, 39, 52, 54, 55, 56, 
and 58 from the monograph by Kohler and Wallach. The figures 
were drawn in India ink on white cards 8 by 55¢ inches in size, and 
were shown at a distance of six feet. No information was given 
the subjects as to the objective nature of the figures or as to the 
nature of the effects being studied. Any of the appearances 
scored as positive by Kohler and Wallach were scored as positive 
here also, with a score of one being assigned any figure on which 
one or more after-effects occurred. The maximum possible score 
for the experimental figures was fifteen. 

The inspection figure was presented for three minutes, the time 
between inspection and test figure was one second, and the test 
figure was exposed for two seconds. Each inspection- and test- 
figure pair was presented once to each subject, one a day for 
seventeen days. The figure pairs were presented to each subject 
in an order determined by use of a table of random numbers. 


RESULTS AND CONCLUSIONS 


Product-moment correlation coefficients were computed be- 
tween number of positive figural after-effects and total and 
subtest standard scores on the Iowa Silent Reading Test. The 
obtained coefficients are presented in Table I, along with means 
and standard deviations. 

According to Lindquist (8) only one of these correlation 
coefficients is significantly greater than zero, that between 
number of figural after-effects and the Selection of Key Words 
subtest of the Iowa Silent Reading Test. None are high enough 
to permit effective prediction of reading-test scores from number 
of positive figural after-effects. 

It is a limitation upon the interpretation of the results of the 
present study that only fifteen experimental figures were used. 
The standard deviation of the distribution of figural after-effect 
scores is quite large in relation to the mean. Ideally, one would 
use a much larger number of figures, and would fully counter- 
balance the order of presentation of these figures. Such a pro- 
cedure was beyond the scope of the present study. It is to be 
noted that all the obtained correlation coefficients were positive 
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TaBLE I.—MEANS AND STANDARD DEVIATIONS FOR STANDARD 
ScoORES ON THE Iowa SILENT READING TEST AND FOR NUMBER 
oF FiGuRAL AFTER-EFFECTS, AND CORRELATION COEFFI- 
CIENTS OBTAINED BETWEEN FIGURAL AFTER-EFFECT 
ScoRES AND READING-TEST SCORES 


M SD e. 
Figural After-effect 6.55 4.32 
Iowa Silent Reading Test, 
Rate 173.85 17.90 . 180 
Comprehension 181.80 16.14 .388 
Directed Reading 172.10 19.75 — .084 
Poetry Comprehension 170.90 23 .04 .001 
Word Meaning 193.35 18.58 . 204 
Sentence Meaning 186.25 15.87 .145 
Paragraph Comprehension 180.60 12.99 244 
Use of the Index 179.35 16.88 . 249 
Selection of Key Words 174.90 16.02 .436 
Total 180.15 11.33 .124 


except one, and one was significantly greater than zero. How- 
ever, with respect to the use of tests for susceptibility to figural 
after-effect as diagnostic tools to aid teachers of remedial reading, 
wherein performance on such tests might be uninfluenced by 
previously-learned reading habits or by linguistic facility, the 
results of the present study must be considered to be negative. 
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Education. 1) The Functions of Measurement in the Facilitation of 
Learning. Walter W. Cook. 2) The Functions of Measurement in 
Improving Instruction. Ralph W. Tyler. 3) The Functions of 
Measurement in Counseling. John G. Darley and Gordon V. Anderson. 
4) The Functions of Measurement in Educational Placement. Henry 
Chauncey and Norman Frederiksen. 

Part Two—The Construction of Achievement Tests. 5) Preliminary 
Considerations in Objective Test Construction. E. F. Lindquist. 
6) Planning the Objective Test. K.W. Vaughn. 7) Writing the Test 
Item. Robert L. Ebel. 8) The Experimental Tryout of Test Mate- 
rials. Herbert S. Conrad. 9) Item Selection Techniques. Frederick 
B. Davis. 10) Administering and Scoring the Objective Test. Arthur 
E. Traxler. 11) Reproducing the Test. Geraldine Spaulding. 12) 
Performance Tests of Educational Achievement. David G. Ryans and 
Norman Frederiksen. 13) The Essay Type of Examination. John M. 
Stalnaker. 

Part Three—Measurement Theory. 14) The Fundamental Nature 
of Measurement. Irving Lorge. 15) Reliability. Robert L. Thorn- 
dike. 16) Validity. Edward E. Cureton. 17) Units, Scores, and 
Norms. John C. Flanagan. 18) Batteries and Profiles. Charles I. 
Mosier. Brief bibliographies with most chapters. A notable bibliog- 
raphy of one hundred six titles classified by topic accompanies Chap- 
ter 9. 


After five years of editorial coaxing, the twenty authors listed 
above with the advice of fifty-one collaborators and the guidance 
of an editorial committee have brought together in one volume 
a comprehensive treatment of the art, the practical technology, 
the logical theory, and the multiple purposes of educational 
measurement. 

The eighteen chapters are organized in three parts—the 
Functions of Measurement in Education, The Construction of 
Achievement Tests, and Measurement Theory. The three parts 
discuss the questions why measure, how measure, and what 
basic ideas to consider in thinking about the process and the 
results of measurement in education. Since these questions 


are deeply involved with the basic values and purposes of educa- 
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tion, it should not surprise anyone to find many of the chapter 
specialists touching frequently on the same issues. The chapters, 
however, fit into a planned editorial organization. The over- 
lapping treatment of topics reveals different applications and 
points of view. The volume is much more than a collection of 
papers. The editor and the authors were in cahoots, each under- 
stood the scope of his own and the others responsibilities. The 
result is a successful collaboration and an important addition to 
the reference and text literature in psychological and educational 
measurement. Measurement workers who deal with problems 
of instruction for advanced students or who have occasion to 
consult on measurement projects in schools will find several 
chapters useful as references on problems which arise in prac- 
tice. The chapters bring together many of the ideas, points of 
views and techniques which have been fugitive and frequently 
inaccessible to students and certainly inaccessible to workers in 
the schools. 

The principal recurrent theme throughout the volume is quite 
naturally the problem of validity. In the largest sense questions 
of validity include all the questions of decision with respect to 
objectives of education. Perhaps the most important feature 
of the book is the consistency with which variations on this theme 
are found in most of the chapters. 

Chapter 1 opens the problem with the statement, ‘‘The value 
of measurements depends on the extent to which relationships 
established (among measures) are crucial from the social point of 
view.”” The chapter directs attention to the significance in 
educational practice of experimental findings about individual 
differences. The theme is continued in Chapter 2, but with 
emphasis upon the process of test construction in the specification 
and selection of educational objectives. Chapter 1 emphasizes 
the implications of the substantive findings of educational 
inquiries using measurement data. Chapter 2 emphasizes the 
effects that the process of conducting a substantive inquiry has 
upon the inquirer, particularly upon teachers who become 
involved in formal evaluation activities. Chapters 3 and 4 
discuss the application of measurement data in the decision 
problems of individuals and institutions. Chapter 3 is concerned 
with the use of measurement observations by counselor and 
counselee in the individual choice of objectives. Chapter 4 
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reviews the policy and measurement practice of institutions in 
the selection and classification of students within the structure 
of institutional objectives as they now exist. 

A major editorial policy introduced in Part One and sustained 
throughout the volume emphasizes the direction in which 
measurement theory, professional techniques, and school admin- 
istrative practice must develop if the large goals accepted by 
measurement workers are to be achieved. Different points of 
view on theoretical and practical issues are noticed. The critical 
commentary on conventional practices is consistently construc- 
tive in suggesting problems which require attention. 

Part II, which comprises half of the volume, is devoted to the 
actual business of constructing achievement tests. Many 
readers will regard this section as the most useful portion of the 
book. The chapters present current practice, and the points 
of view of widely experienced specialists. Practically none of 
the material presented is easily available elsewhere, and much 
of it is not accessible anywhere. For example, the problems of 
reproducing test materials by various duplicating and printing 
processes, including the details of typography and format, the 
clerical routines of administering and scoring objective tests in 
large and small programs, and the problems of test editors and 
item writers in the formulation of usable test materials are here 
reviewed in elaborate, practical detail. Item analysis pro- 
cedures are comprehensively presented including a lengthy dis- 
cussion of the effects of omissions, corrections for guessing, and 
time limits on item statistics. It is refreshing to find this chapter 
opening with the statement, ‘“‘ By the time the tryout forms for a 
test have been constructed much of the opportunity for selecting 
items has already passed.” Here, again, is an example of the 
approach to the problem of testing which is consistent throughout 
the book: the desired end-product of a test constructor is a valid 
score. The discussion of test score statistics is properly sub- 
ordinated to the issues underlying the statistical interpretations. 
The basic issue of what to test and the limitations imposed on the 
test constructor by the actual facts of school practice are clearly 
presented by the editor in Chapter 5. Part Two is closed with a 
provocative and suggestive chapter on the essay examination. 
These two chapters, 5 and 13, and Chapter 7 on Item Writing, 
could well be reprinted separately for wider circulation. 
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The five chapters of Part Three consider the logical elements of 
measurement theory applied to educational and psychological 
observations. The opening chapter of this part is concerned 
with (1) the idea of properties or characteristics of objects, 
(2) the process of observation, (3) the classification or grouping 
of observations, (4) the considerations underlying the use of 
numerals to denominate classes, and (5) the logical limitations 
that apply to numerical scales and to computations involving 
scaled numbers. The essay should be helpfully informative to 
many students, especially to those who have a technical facility 
with numbers and statistics built on a superficial awareness of 
the basic notions. 

Chapter 15 reviews the issues involved in reliability of scores 
with emphasis on procedural design in collecting data, com- 
putation of statistical estimates of reliability, and the definitions 
and assumptions involved in the interpretation of reliability 
statistics. Particular attention is given to the relevant factors 
in the experimental circumstances that are too often unrecog- 
nized by test authors, publishers, researchers, and consumers. 

Chapters 16, 17, and 18 discuss comprehensively the logical 
problems of determining what information a test score contains 
and the technical aspect of effective communication of the 
information a score does contain. Chapter 18 discusses the 
elements of the statistical theory in combining tests into batteries. 
Current practice and malpractice are adequately described, 
especially with respect to the use of graphic profile methods of 
reporting scores. Some attention is given to the unreliability of 
intra-individual score differences as well as to reliability of 
composite scores. 

Chapter 17 presents a full discussion of most of the technical 
topics that professional test producers and consumers regard 
as vital in the development of effective educational measurement. 
The chapter discusses the questions, (1) how to derive scores 
which will properly convey the information contained in the 
observed behavior, (2) how to locate these scores in a meaningful 
comparative context, and (3) how to communicate the most 
information in acomprehensible way. These questions introduce 
problems of deriving reproducible units of measure, of choosing 
among arbitrary numerical scales, of comparability of measures, 
and of obtaining and reporting comparative or normative 





Book Reviews 317 


experience in useful form. The discussion includes power scales, 
percentiles, age and grade equivalents, sense difference and 
isochron scores, and the family of standard scores including 
Coédperative Scaled Scores. Extensive attention is given to 
the problem of overlapping distributions in defining units of 
measure. The definition and experimental determination of 
equivalence of tests and comparability of scores is discussed at 
length, both as a logical problem and as an important technical 
concern to the producer and consumer. The chapter will be a 
useful mine of suggestive ideas, not only to test producers, but 
also to workers with responsibility for surveys, guidance pro- 
grams, system-wide curriculum evaluation and other inquiries 
in which measurement data are basic. 

Chapter 16 under the title ‘Validity’ differs significantly from 
the other seventeen. The bulk of the volume presents the 
technics, experience and points of view in current measurement 
thinking. The new developments presented are largely matters 
of detail, refinement, or restatement. Chapter 16 is different 
in the sense that it presents a new formulation of the problem. 
The contribution of the essay lies in the clarification of issues 
which have been obscured in a lather of correlation coefficients 
or dimly seen in the shadow of an ‘inadequate criterion.’ The 
discussion is confined to logical questions. No attention is 
given to the substantive findings of ordinary ‘validity studies.’ 

The clarification is achieved by defining ‘a set of ultimate 
criterion scores,’ and then distinguishing between logically differ- 
ent relations between criterion and test observations. In 
addition to defining a set of ultimate criterion scores, which 
must be unbiased but need not be perfectly reliable, Cureton 
defines three statistical quantities: (1) prediction power is the 
correlation between raw test scores and raw criterion scores, 
(2) validity is the correlation between raw test scores and ‘true’ 
or perfectly reliable criterion scores, (3) relevance is the correlation 
between ‘true’ test scores and ‘true’ criterion scores. The 
conceptual clarification which follows from these definitions 
is nicely revealed in four subsections entitled The Criterion, 
Logical Problems, Estimation, and Statistical Problems. The 
chapter makes a distinguished contribution to the logical formula- 
tion of the ‘validity problem.’ 

The book is of convenient size, well printed, sturdily bound, 
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modestly indexed and not burdened with extensive, unnecessary 
mathematical writing. The large number of educators and 
psychologists who need a source of information, ideas, and dis- 
cussion of the theory and the practice of educational measure- 
ment will be grateful to the American Council on Education 
which sponsored the project, to the Grant Foundation which 
made it possible, and to the editor, authors, and collaborators 
for assembling such a substantial and consistently good volume. 
C. R. LANGMUIR 


Syracuse University 


NorRMAN CAMERON AND ANN MaGaret. Behavior Pathology. 
Boston: Houghton Mifflin Co., 1951, pp. 645. 


Behavior pathology in this volume is defined as a study of 
those forms of human behavior which render the individual per- 
sistently tense, dissatisfied, incompetent, or ineffectual. The 
general criteria for defining pathology are: (1) marked departure 
from cultural expectation and (2) behavior which adheres 
closely to cultural expectation but still is personally unrewarding. 
The frame of reference in this volume as in Cameron’s Psychology 
of Behavior Disorders is the bio-social viewpoint. The chief 
emphasis is on social learning. Whereas in the Psychology of 
Behavior Disorders the emphasis is on understanding mental 
illnesses, the emphasis in this book is upon understanding the 
patient as an individual in the light of his biological inheritance 
and his social learning. One volume very nicely supplements 
the other. There is overlapping not only in viewpoint but in 
content and concepts considered. The book contains many 
references in the form of footnotes; fairly lengthy subject and 
name indices; and a laudatory preface by the editor of the series, 
Dr. Leonard Carmichael. 

The content is presented in fifteen chapters. Illustrative of 
content and its organization are the captions for the chapters. 
In the introductory chapter the field of behavior pathology and 
the problems and methods used for studying are considered. 
The next two chapters are devoted to a consideration of behavior 
organization; one in terms of needs, stress and frustration and 
the second in terms of learning and behavior pathology. Then 
follows a series of chapters on developmental considerations 
including réle-taking and emotional reactions as well as matura- 
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tion and fixation. Next are chapters on developmental devia- 
tions, one on developmental retardation and one on social devia- 
tion, both under the general caption of ‘‘ Behavior Pathology and 
Bio-social Immaturity.” A consideration of more specific 
topics such as regression, withdrawal and invalidism, anxiety 
in normal behavior, conflict, repression, pseudocommunity and 
delusion, autistic community and hallucination, disorganization, 
desocialization and deterioration. In the last two chapters 
therapy is considered under captions of ‘‘Therapy in Behavior 
Pathology” and ‘‘Learning and Therapy.” 

Since the general viewpoint and the pragmatic approach in 
evaluating material still prevail in this as in Dr. Cameron’s 
earlier volume, the reader is referred to the review of the first 
volume in this JouRNAL (Vol. 40, No. 6, October, 1949, pp. 382- 
384.) In this latter work the content is presented in a more 
orderly and clear manner; there is less argumentative material 
than in the first volume, and less concern with re-interpretation 
of content. In the impression of the reviewer the first volume is 
likely to play a more significant réle in psychological literature 
merely because it appeared first. However, for textbook pur- 
poses the second volume is superior to the first. The pattern 
of emphasis in this volume resembles that in the earlier volume. 
For example, pathological lying considered in the chapter on 
social deviation has one reference to Healy, and other related 
references are neglected. In a general discussion of delinquency 
Healy is mentioned only twice, once on pathological lying and 
secondly on his joint work with Dr. Alexander on Roots of Crime. 
Even Healy’s New Lights on Delinquency and Its Treatment, 
where the presentation of material is definitely in line with the 
viewpoint expressed by Cameron and Magaret, is not men- 
tioned. Even more so than the first volume there is frequent 
reference to Levy and Miller. This is understandable in the 
light of the viewpoint stressed. Though the authors say that 
this present-day knowledge calls for integration with the social 
sciences, an anthropologist like Kluckhohn gets brief considera- 
tion. Margaret Mead is completely omitted. The two chapters 
on therapy are fairly critical. There is a definite slanting in the 
direction of non-directive approaches but there is an awareness 
that the principles of social learning which underlie both the 
development and treatment of behavior pathology can be 
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translated into specific therapeutic techniques in different ways. 
Both theory and pragmatic test determine the variety of therapy 
employed with a given patient. Hence, the authors do see and 
describe the most conspicuous feature of contemporary therapy 
in behavior pathology to be its diversity. In therapeutic rela- 
tionship the patient’s anxieties are utilized in further social 
learning and his more general attitude is that no therapy fits 
exactly any single theory of learning. All therapies involve 
one or more of the general phenomena of learning which are to be 
found in the development of normal and pathological reactions. 

Behavior Pathology should serve as an excellent text for a 
course in abnormal psychology for students who have had nothing 
but an elementary course in psychology. It is well enough 
written to be understood by informed readers. To persons 
who are taking a series of courses in organized, sequential manner 
on the road toward becoming clinical psychologists, Behavior 
Pathology can serve as a good supplement for the Psychology of 
Behavior Disorders. H. MELTZER 

Psychological Service Center 

St. Louis, Mo. 





